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Description 

FIELD OF THE INVENTION 

5 The present invention relates to fusion proteins and a process for preparing fusion proteins. The invention also 

pertains to various oligonucleotide and amino acid sequences which make up proteins of the present invention. 

BACKGROUND OF THE INVENTION 

10 Proteins, which in addition to the desired protein, also have an undesirable constituent or "ballast" constituent in 

the end product are referred to as fusion proteins. When proteins are prepared by genetic engineering, the intermediate 
stage of a fusion protein is utilized particularly it. in direct expression, the desired protein is decomposed relatively 
rapidly by host-endogenous proteases, causing reduced or entirely Inadequate yields of the desired protein. 

The magnitude of the ballast constituent of the fusion protein is usually selected in such a manner that an insoluble 

15 fusion protein is obtained. This insolubility not only provides the desired protection against the host-endogenous pro- 
teases but also permits easy separation from the soluble cell components. It is usually accepted that the proportion of 
the desired protein in the fusion protein is relatively small, i.e. that the cell produces a relatively large quantity of "ballast". 

The preparation of fusion proteins with a short ballast constituent has been attempted. For example, a gene fusion 
was prepared which codes for a fusion protein from the first ten amino acids of p-galactosidase and somatostatin. 

20 However, it was obsen/ed that this short amino acid chain did not adequately protect the fusion protein against decom- 
position by the host-endogenous proteases (US-A 4 366 246, Column 15. Paragraph 2). 

From EP-A 0 290 005 and 0 292 763, we know of fusion proteins, the ballast constituent of which consists of a 
galactosidase fragment with more than 250 amino acids. These fusion proteins are insoluble, but they can easily be 
rendered soluble with urea (EP-A 0 290 005). 

2S Although fusion proteins have been described in the art. the generation of fusion proteins with desirable traits such 

as protease resistance is a laborious procedure and often results in fusion proteins that have a number of undesirable 
characteristics. Thus, a need exists for an efficient process for producing fusion proteins with a number of attractive 
traits including protease resistance, proper folding, and effective cleavage of the ballast from the desired protein. 

30 smMARY OF THE INVENTION 

The present invention relates to a process for the preparation of fusion proteins. Fusion proteins of the present 
invention contain a desired protein and a ballast constituent. The process of the present invention involves generating 
an oligonucleotide library (mixture) coding for ballast constituents, inserting the mixed oligonucleotide (library) into a 
35 vector so that the oligonucleotide is functionally linked to a regulatory region and to the structural gene coding for the 
said desired protein, and transforming host cells with the so-obtained vector population. Transformants are then se- 
lected which express a fusion protein in high yield. 

The process of the present invention further includes oligonucleotide coding for an amino acid or for a group of 
amino acids which allows an easy cleavage of the desired protein from the said ballast constituent The cleavage may 
40 be enzymatic or chemical. 

The invention also pertains to an oligonucleotide designed so that it leads to an insoluble fusion protein which can 
easily be soiubilized. Fusion proteins of the present invention thus fulfill the requirements established for protease 
resistance. 

Furthermore, oligonucleotide of the present Invention may be designed so that the ballast constituent does not 
^5 interfere with folding of the desired protein. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 and its continuation in Figure la and Figure lb show the constructk>n of plasmid population (gene bank) 
50 plNT4x from the known plasmid pH154/25* via plasmid plNT40. Other constructtons have not been graphically pre- 
sented because they are readily apparent from the figures. 

Figure 2 is a map of plasmid pUHIO containing the complete HMQ CoA reductase gene. 
Figures 3 and 3a show construction of plK4, a plasmid containing the mini-prolnsulin gene. 

55 DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to a process for the preparation of a fusion prot in characterized in that a mixed oligonucle- 
otide is constructed whk:h codes for the ballast constituent of the fusion protein. The oligonucleotide mixture is intro- 
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duced In a vector in such a manner that it is functionally linked to a regulatory region and to the structural gene for the 
desired protein. Appropriate host cells are transformed with the plasmid population obtained in this manner, and the 
clones producing a high yield of coded fusion protein are selected. Advantageous embodiments of this invention are 
explained below: 

5 The oligonucleotide advantageously codes at the 3'-end an amino acid or a group of amino acids which permits 

or permit easy and preferably enzymatic cleavage of the ballast constituent from the desired protein. According to 
another implementation form, an oligonucleotide Is constructed that yields an insoluble fusion protein which can easily 
be made soluble. In particular, an oligonucleotide is preferably constructed which codes for a ballast constituent that 
does not disturb the folding of the desired protein. 
10 For practical reasons, the construction, according to the invention, of the oligonucleotide for the ballast constituent 

causes the latter to be very short. 

It was surprising to observe that, even when they have an extremely short ballast constituent, fusion proteins not 
only fulfill the requirements established for protease resistance, but are also produced at a high expression rate and, 
if desired, the fusion protein is insoluble, can easily be rendered soluble. In the dissolved or soluble state, the short 
IS ballast constituent according to the invention then permits a sterically favorable conformation of the desired protein so 
that it can be properly folded and easily separated from the ballast constituent. 

If the desired protein is formed in a pro-form, the ballast constituent can be constituted in such a manner that its 
cleavage can occur concomitantly with the transformation of the pro-protein into the mature protein. In insulin prepa- 
ration, for example, the ballast constituent and the C chain can be removed simultaneously, yielding a derivative of the 
20 mature insulin which can be transformed into insulin without any side reactions involving much loss. 

The short ballast constituent according to the invention is actually shorter than the usual signal sequences of 
proteins and does not disturb the folding of the desired protein. It therefore need not be eliminated prior to the final 
processing step yielding the mature protein. 

The oligonucleotide coding for the ballast constituent preferably contains the DN A sequence (coding strand) 

2S 

(DCD)^ 

in which 0 stands for A, G or T and x is 4-12, preferably 4-8. 
30 In particular, the oligonucleotide is characterized by the DNA sequence (coding strand) 

ATG (DCD)y (NNN)j 

3S in which N in the NNN triplet stands for identical or different nucleotides, excluding stop codons, z Is 1-4 and y+z Is 
6-12, preferably 6-10. wherein y is at least 4. It has proved advantageous for the oligonucleotide to have the DNA 
sequence (coding strand) 

40 ATG (DCD)5,g (NNN) 

especially if it has the DNA sequence (coding strand) 

45 ATG GCW (DCD)^.3 CGW 

or, advantageously 

so ATG GCA (DCD)^.7 CGW 

in which W stands for A or T 

The above-mentioned DNA model sequences fulfill alt of these requirements. Codon DCD codes for amino acids 
serine, threonine and alanine and therefore for a relatively hydrophilic protein chain. Stop codons are excluded and 
ss selection of the amino acids remains within manageable scope. The following is a particularly preferable embodiment 
of the DNA sequence for the ballast constituent, especially if the desired protein is proinsulin: 
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ATG GCW (DCD)y, ACG CGW 

or 

5 

ATG GCD (DCD)y, ACG CGT 

in which y' signifies 3 to 6, especially 4 to 6. 

10 The second codon. GOO, codes for alanine and connpletes the recognition sequence for the restriction enzyme 

Ncol, provided that the anterior regulation sequence ends with CO. The next to last triplet codes for threonine and, 
together with the codon CGT for arginine, represents the recognition sequence for restriction enzyme Mlul. Conse- 
quently, this oligonucleotide can be easily and unambiguously incorporated In gene constructions. 

The (NNN)z group codes in the 3* position for an amino acid or a group of amino acids that permits simple, and 

15 preferably enzymatic, separation of the ballast constituent from the subsequent protein desired. It is expedient to select 
the nucleotides In this group in such a manner that at the 3*-end they code the cleavage site of a restriction enzyme 
which permits linkage of the structural gene for the desired protein. It is also advantageous for the ATG start codon 
and if necessary the first DCD triplet to be incorporated Into the recognition sequence of a restriction enzyme so that 
the gene for the ballast constituent according to the invention can easily be inserted in the usual vectors. 

20 The upper limit of z is obtained on the one hand from the desired cleavage site for (enzymatic) cleavage of the 

fusion protein obtained, i.e. it encompasses codons, for example, for the amino acid sequence lle-Glu-Gly-Arg, in case 
cleavage is to be carried out with factor Xa. In general, the upper limit for the sum of y and z is 12. since the ballast 
constituent should of course be as small as possible and. above all, not interfere with the folding of the desired protein. 
For reasons of expediency, bacteria or low eukaryotic ceils such as yeasts are preferred as the host organism in 

25 genetic engineering processes, provided that higher organisms are not required. In these processes, the expression 
of the heterologous gene is regulated by a homologous regulatory region, i.e. one that Is intrinsic to the host or com- 
patible with the host cell. If a pre-peptide is expressed, it often occurs that the pre-sequence is also heterologous to 
the host cell. In practice, this lacking "sequence harnrKmy' frequently results in variable and unpredictable protein yiekJs. 
Since the ballast sequence according to the invention is adapted to its environment, the selection process according 

30 to the invention yields a DNA construction characterized by this "sequence harmony". 

The beginning and end of the ballast constituent are set in this construction: Methionine is at the beginning, and 
an amino acid or a group of amino acids that permit the desired separation of the ballast constituent from the desired 
protein is at the end. If, for example, the desired protein is proinsulin, as NNN a triplet coding for arginine is advanta- 
geously selected as the last codon as this permits the particularly favorable simultaneous cleaving off of the ballast 

35 constituent with the removal of the C chain. Of course, the end of the ballast constituent can also be an amino acid or 
a group of amino acids which allows a chemical cleavage, e.g. methionine, so that cleavage is possible with cyanogen 
bromide or chloride. 

The intermediate amino acid sequence should be as short as possible so that folding of the desired protein is not 
affected. Moreover, this chain should be relatively hydrophilic so that solubilization is facilitated with undissolved fusion 

40 proteins and the fusion protein remains soluble. Cysteine residues are undesirable since they can interfere with the 
formation of the disulfide bridges. 

The DNA coding for the ballast constituent is synthesized in the form of a mixed oligonucleotide; it Is incorporated 
in a suitable expression plasmid immediately in front of the structural gene for the desired protein and E. coll is trans- 
formed with the gene bank obtained in this manner. Appropriate gene structures can be obtained in this way by the 

4S selection of bacterial clones that produce corresponding fusion proteins. 

It was previously mentioned that the cleavage sites for the restriction enzymes at the beginning and end of the 
nucleotide sequence coding for the ballast constituent are to be regarded as examples only. Recognition sequences 
that encompass starting codon ATG and In which any nucleotkles that folbw may include the codon for suitable amino 
acids are, by way of example, also those for restriction enzymes Af 111 I, Ndel, NIalll, NspHI orStyl. Since in the preferred 

so embodiment arginine is to be at the end of the ballast sequence and since there are six different codons for arginine, 
additional appropriate restriction enzymes can also be found here for use instead of Mlul, i.e., Nrul, Avrll, Afllll, Clal 
or Haell. 

However, it is also advantageous to use a "polymerase chain reaction" (PGR) according to Saiki, R.K. et al. . Science 
239:487-491. 1988, which can dispense with the construction of specific recognition sites for restriction enzymes. 
5S It was previously indicated that limitation to the DNA sequence (DCD)x Is for reasons of expediency and that this 

does not rule out other codons such as, for example, those for glycine, proline, lysine, methionine or asparagine. 

The most efficient embodiment of this DNA sequenc is obtained by sel ction of good producers of the fusion 
protein, i.e., the fusion protein containing proinsulin. This yields the most favorable combination of regulation sequence, 
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ballast sequence and desired protein, as a result of which unfavorable connbinations of promoter, ballast sequence 
and structural gene are avoided and good results are obtained with minimum expenditure in terms of the above-men- 
tioned "sequence harnrrony". 

Surprisingly, it was observed that the genes optimized for the ballast constituent according to the invention do not 
5 always contain the triplets preferred by E. coli. It was found that for Thr. codon AC A, which is used least frequently by 
E. coli, actually occurs frequently in the selected sequences. If, for example, the following amino acid sequence were 
optimized according to the preferred codon usage (p.c.u.) by E. coli (p.c.u.: Aota, S. et al.. Nucleic Acids Research 16 
(supplement): r315, r316, r391, r402 (1988)). we would obtain a totally different gene structure than that obtained 
according to the invention (Cf Table 1 ): 

w 



Ala 


Thr 


Thr 


Ser 


Thr 


Ala 


Thr 


Thr 


GCG 


ACC 


ACC 


AGC 


ACC 


GCG 


ACC 


ACC p.c.u. 


GCA 


ACA 


ACA 


TCA 


ACA 


GCA 


ACT 


ACG invention 



1$ 

In the case of the fusion proteins with a proinsulin constituent, the initial starting point was a ballast constituent 
with 10 amino acids. The DNA sequence of the best producer then served as the base for variations in this sequence, 
whereupon it was noted that up to 3 amino acids can be eliminated without a noticeable loss in the relative expression 

20 rate. This finding is not only surprising, since it was unexpected that such a short ballast protein would be adequate, 
but also very advantageous since of course the relative proportion of proinsulin in the fusion protein increases as the 
ballast constituent decreases. 

The significance of the ballast constituent in the protein is apparent from the following comparison: 
Human proinsulin contains 86 amino acids. If, for a fusion protein according to EP-A 0 290 005, we take the lower limit 

25 of 250 amino acids for the ballast constituent, the fusion protein has 336 amino acids, only about one quarter of which 
occur in the desired protein. By comparison, a fusion protein according to the invention with only 7 amino acids in the 
ballast constituent has 93 amino acids, the proinsulin constituent amounts to 92.5%. If the desired protein has many 
more amino acids than the proinsulin, the relationship between ballast and desired protein becomes even more favo- 
rable. 

30 It has been mentioned on a number of occasions that as a desired protein proinsulin represents only one preferred 
embodiment of the Invention. However, the invention also works with much larger fusion proteins for which a fusion 
protein with the active domain of human 3-hydroxy-3-methylgtutaryl-coenzyme A-reductase (HMG) is mentioned as 
an example. This protein contains 461 amino acids. A gene coding for the latter is known e.g. from EP-A 292 803. 
Having now generally described the invention, the same will be more readily understood through reference to the 

35 following examples which are provided by way of Illustration, and are not intended to be limiting of the present inventk)n, 
unless specified. 

Example 1 

40 Construction of the gene bank and selection of a clone with high expression 

If not otherwise indicated, all media are prepared according to Maniatis, T; Fritsch, E. F. and Sambrook, J.: Mo- 
lecular Cloning, Cold Spring Harbor Laboratory (1982). TP medium consists of M9CA medium but with a glucose and 
casamino acid content of 0.4% each. If not othenwise Indicated, alt media contain 50 \igfm\ ampicillin. Bacterial growth 

45 during fermentation is determined by measurement of the optical density of the cultures at 600 nm (OD). Percentage 
data refer to weight if no other data is reported. 

The starting material is plasmid pHI 54/25* (figure 1), which is known from EP-A 0 211 299 herein incorporated by 
reference. This plasmid contains a fusion protein gene (D'-Proin) linked to a trp-promoter and a resistance gene for 
resistance against the antibiotk: ampicillin (Amp). The fusion protein gene codes a fusion protein that contains a frag- 

50 ment of the trpD-proteln from E. coli (D') and monkey proinsulin (Proin). The gene structure of the plasmid results in 
a polycistronic mRNA, which codes for both the fusion protein and the resistance gene product. To suppress the for- 
mation of excess resistance gene product, initially the (commercial) trp-transcription terminator sequence (trpTer) (2) 
is introduced between the two structural genes. To do so, the plasmid is opened with EcoRI and the protruding ends 
are filled in with Klenow polymerase. The resulting DNA fragment with blunt ends is linked with the terminator sequence 

55 (2) 
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5 ' AGCCCGCCTAATGAGCGGGCTTTTTTTT3 ' 

3 ' TCGGGCGGATTACTCGCCCGAAAAAAAA5 ' ( 2 ) 

which results in plasmid plNT12 (figure 1-(3)). 

The starting plasmid pHl 54/25* contains a cleavage site for enzyme Pvul In the Amp gene, as well as a HIndlll- 
cleavage site in the carboxyterminal area of the trpD-fragment. Both cleavage sites are therefore also contained in 
plNT12. By cutting the plasmid (Figure 1-(3)) with Pvul and Hindlll. it is split into two fragments from which the one 
containing the proinsulin gene (figure 1-(4)) is isolated. Plasmid pGATTP (figure la-(5)), which is structured in an anal- 
ogous manner to (3) but which instead of the D'-Proin gene carries a gamma-lnterferon gene (Ifn) containing restriction 
cleavage sites Ncol and Hindlll. is also cut with Pvul and Hindlll and the fragment (figure la-(6)) with the promoter 
region is isolated. By ligation of this fragment (6) with the fragment (4) obtained from (3), we acquire plasmid plNT40 
(figure la-(7)). The small fragment with the remainder of the gamma-interferon gene Is cut from the latter with Ncol and 
Mlul. The large fragment (figure lb-(8)) is ligated with mixed olignonucleotide (9) 

5 ' CATGGCDDCDDCDDCDDCDDCDDCDA3 ' 

3 ' CGHHGHHGHHGHHGHHGHHGHTGCGC5 ' ( 9 ) 

In which D stands for A. G or T and H signifies the complementary nucleotide. This results In plasmid population (gene 
bank) plNT4x (figure 1b-(10)). Mixed oligonucleotides of the present Invention may be obtained by techniques well 
known to those of skill in the art. 

The mixed oligonucleotide (9) is obtained from the synthetic mixed oligonucleotide (9a) 

TTCGGGTACCGHHGHHGHHGHHGHHGHHGHTGCGCAG5' 
TTGCCCATGGC3 ' (9a) 

which Is filled in with Klenow polymerase and cut with Mlul and Nco. 

The strain E. colt WS3110 is transformed with the plasmid population (10) and the bacteria are plated on LB agar 
dishes. Six of the resulting bacterial clones are tested for their ability to produce a fusion protein with an insulin con- 
stituent. For this purpose, overnight cultures of the clones are prepared in LB medium, and 100 ^1 aliquots of the 
cultures are mixed with 10.5 ml TP medium and shaken at 37'*C. At OD600 = 1 the cultures are adjusted to 20 fig/ml 
3-(5-indolylacryllc acid (lAA), a solution of 40 mg glucose in 100 ml water is added and the preparation Is shaken for 
another three hours at 37^ C. Subsequently 6 OD equivalents of the culture are removed, the bacteria contained therein 
are harvested by centrifugation and resuspended in 300 ^l test buffer (37.5 mM tris of pH 6.5, 7 M urea. 1% (wA^) SDS 
and 4% {y/N) 2-mercaptoethanol). The suspension Is heated for five minutes, treated for two seconds with ultrasound 
to reduce viscosity and aliquots thereof are subsequently subjected to SDS-gel electrophoresis. With bacteria that 
produce fusion protein, we can expect a protein band with a molecular weight of 10,350 D. It is evident that one of the 
clones, plNT41 (Table 1), produces an appropriate protein in relatively large quantities while no such protein formatbn 
is seen with the remaining clones. An immune bk>t experiment with insulin-specific antibodies confirms that the protein 
coded by plNT41 contains an insulin constituent. 

Table 1 shows the DNA and amino acid sequence of the ballast constituent for a number of plasmid constructs. 
In particular, table 1 illustrates the DNA and amino acid sequence of the ballast constituent in the pi NT41 fusion protein. 



6 



EP 0 489 780 B1 

Table 1 



123456789 10 11 

Met Ala Thr Thr Ser Thr Ala Thr Thr Arg 

ATG GCA ACA ACA TCA ACA GCA ACT ACG CGT 

Thr Ser Thr 

*** **G A*T T*G A*G **G *** *** 

Ala Thr Ser Thr Ser 

*** **T G** *** A*T T*T A*T T*A *** *** 

Asn Ser 

*** *** *** AAC T*A *** *** 

ieitie *** *it* *** *★* **A 

*** it-kit *** 4r*A 

itit* *** ititic 

Gly Asn Ser Ala 
**★ ititit *** *** *** *G* *A* T** GCA **A 

Lys 

*★* *** ititit *** *** AA* ititp^ 

Pro 

*** *** ititit ititit *** *** C** — **A 

Met 

*** *** ititit ititit *** *** ATG **A 
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Gly ' 

★ ie** ifk* **★ -kit* *G* **A 96d 

5 

Example 2 

Selection of additional clones 

10 

To detect additional suitable clones, a method according to Heifman, D.M. et al. (Proc. Natl. Acad. Sci. USA 80: 
31-35, 1983) is used. TP-agar dishes, the medium of which contains an additional 40 ^im/ml lAA, are utilized for this 
purpose. Fifteen minutes before use, the agar surface of the plates is coated with a 2-mm thick TP top agar layer, a 
nitrocellulose filter is placed on the latter and freshly transformed cells are placed on the filter. Copies are made of the 

'5 filters which have grown bacteria colonies following incubation at 37*C, and the bacteria from the original filter are 
lysed. To accomplish this, the filters are exposed to a chloroform atmosphere in an desiccator for 15 minutes, subse- 
quently moved slowly for six hours at room temperature in immune buffer (50 mM tris of pH 7.5, 150 mM NaCI, 5 mM 
MgCl2, and 3% (w/v) BSA), which contains an additional 1 ^.g/ml DNase I and 40 p.g/ml lysozyme, and then washed 
twice for five minutes in washing buffer (50 mM tris of pH 7.5 and 150 mM NaCI). The filters are then incubated overnight 

20 at 3^0 in immune buffer with insulin-specific antibodies, washed four times for five minutes with washing buffer, incu- 
bated for one hour in immune buffer with a protein A-horseradish peroxidase conjugate, washed again four times for 
five minutes with washing buffer and colonies that have bound antibodies are visualized with a color reaction. Clones 
plNT42 and plNT43, which also produce fairly large quantities of fusion protein, are found in this manner in 500 colonies. 
The DNA obtained by sequencing and the amino acid sequences derived from it have also been reproduced in Table 1 . 

2S 

Example 3 

Preparation of plasmid plNT41d, 

30 Between the replication origin and the trp-promoter, plasmid plNT41 contains a nonessential DNA region which is 

flanked by cleavage sites for enzyme Nsp(7524)1 . To remove this region from the plasmid, plNT41 Is cut with NSP 
(7524)1 , and the larger of the resulting fragments is isolated and religated. This gives rise to plasmid plNT4ld, the DNA 
sequence of which is reproduced in Table 2. 

35 
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45 
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Table 2: DNA-Sequence of Plasmid pINT41d 



10 30 50 

GTGTCATGGTCGGTGATCGCCAGGGTGCCGACGCGCATCTCGACTTGCACGGTGCACCAA 

70 90 110 

TGCTTCTGGCGTCAGGCAGCCATCGGAAGCTGTGGTATGGCTGTGCAGGTCGTAAATCAC 

130 150 170 

TGCATAATTCGTGTCGCTCAAGGCGCACTCCCGTTCTGGATAATGTTTTTTGCGCCGACA 

190 210 230 

TCATAACGGTTCTGGCAAATATTCTGAAATGAGCTGTTGACAATTAATCATCGAACTAGT 

250 270 290 

TAACTAGTACGCAAGTTCACGTAAAAAGGGTATCGACCATGGCAACAACATCAACAGCAA 

310 330 350 

CTACGCGTTTCGTGAACCAGCACCTGTGCGGCTCCCACCTAGTGGAAGCTCTCTACCTGG 

370 390 410 

TGTGCGGGGAGCGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCCTC 

430 450 470 

AGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTGGCGC 

490 510 530 

TGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGCACCAGCATCTGCTCCC 

550 570 590 

TCTACCAGCTGGAGAACTACTGCAACTAATAGTCGACCTTTGCTTTCATTGTCGATGATA 

610 630 650 

AGCTGTCAAACATGAGAATTAGCCCGCCTAATGAGCGGGCTTTTTTTTAATTCTTGAAGA 

670 690 710 

CGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTTTCT 

730 750 770 

TAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTC 

790 810 830 

TAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAA 

850 870 890 

TATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTT 

910 930 950 

GCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCT 

970 990 1010 

GAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATC 

1030 1050 1070 

CTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTA 

1090 1110 1130 

TGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATACAC 

1150 1170 1190 

TATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGC 

1210 1230 1250 

ATGACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAAC 

1270 1290 1310 

TTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGG 

1330 1350 1370 

GATCATGTAACTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGAC 

1390 1410 1430 

GAGCGTGACACCACGATGCCTGCAGCAATGGCAACAACGTTGCGCAAACTATTAACTGGC 
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1450 1470 1490 

GAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTT 

1510 1530 1350 

GCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGA 

1570 1590 1610 

GCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCC 

1630 1650 1670 

CGTATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAG 

1690 1710 1730 

ATCGCTGAGATAGGTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCA 

1750 1770 1790 

TATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATC 

1810 1830 1850 

CTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCA 

1870 1890 1910 

GACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGC 

1930 1950 1970 

TGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTA 

1990 2010 2030 

CCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTT 

2050 2070 2090 

CTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTC 

2110 2130 2150 

GCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGG 

2170 2190 2210 

TTGGACTCAAGACGATAGTTACCGGTAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGT 

2230 2250 2270 

GCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGC 

2290 2310 2330 

ATTGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCA 

2350 2370 2390 

GGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATA 

2410 2430 2450 

GTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGG 

2470 2490 2510 

GGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCT 

2530 2550 2570 

GGCCTTTTGCTCACATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGC 

TG 



Example 4 

Fermentation and processing of plNT41d-fusion protein 

(i) Fermentation: A shaking culture in LB medium is prepared from E. coli W31 1 0 transformed with plNT41d. Fifteen 
)il of this culture, which has an CD = 2 are then put into 15.7 1 TP medium and the suspension is fermented 16 
hours at 37**C. The culture, which at this time has an CD = 1 3, is then adjusted to 20 ng/ml I AA, and until the end 
of fermentation, after another five hours, a 50% (w/v) maltose solution is continuously pumped in at a rate of 100 
ml/hour. An CD = 17.5 is attained in this process. At the end. the bacteria are harvested by centrifugatton. 

(ii) Rupture of Cells: The cells are resuspended in 400 ml/disintegration buffer (10 mM tris of pH 8.0, 5 mM EDTA) 
and disrupted in a French press. The fusion protein containing Insulin is subsequently concentrated by 30 minutes 
of centrif ugation at 23,500 g and washed with disintegration buffer. This yields 1 34 g sediment (nrK>ist substance). 
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(iii) Sulfitolysis: 12.5 g sediment (moist substance) from (ii) are stirred into 125 ml of an 8 M urea solution at 35^C. 
After stirring for thirty minutes, the solution is adjusted to pH 9.5 with sodium hydroxide solution and reacted with 
1 g sodium sulfite. After an additional thirty minutes of stirring at SS^C. 0.25 g sodium tetrathionate is added and 
the mixture is again stirred for thirty minutes at 35°C. 

5 

(iv) OEAE-Anion exchange chromatography: The entire batch of (iii) is diluted with 250 ml buffer A (50 mM glycine, 
pH 9,0) and placed on a chromatography column which contains FractogeK^) TSK DEAE-650 (column volume 
130 ml, column diameter 26 mm) equilibrated with buffer A. After washing with buffer A, the fusion protein-S- 
sulfonate is eluted with a salt gradient consisting of 250 ml each buffer A and buffer B (50 mM glycine of pH 9.0, 

10 3 M urea and 1 M NaCI) at a flow rate of 3 ml/minute. The fractions containing fusion protein-S-sulfonate are then 

combined. 

(v) Folding and enzymatic cleavage: The combined fractions from (iv) are diluted at 4'C in a volume ratio of 1 + 
9 with folding buffer (50 mM glycine, pH 10.7) and per liter of the resulting dilution 410 mg ascorbic acid and 165 

is jil 2-mercaptoethanol are added at 4*C under gentle stirring. After correction of the pH value to pH 10,5, stirring 

is continued for another 4 hours at 4*0. Subsequently, solid N-(2-hydroxyethyl)-piperazine-N'-2-ethane sulfonic 
acid (HEPES) is added to an end concentration of 24 g per batch-liter. The mixture which now has pH 8 is digested 
with trypsin at 25*0. During the process, the enzyme concentration in the digestion mixture is 80 jig/l. The cleavage 
course is followed analytically by RP-HPLC. After two hours, digestion can be stopped by addition of 130 |ig soy 

20 bean trypsin inhibitor. HPLC shows the formation of 1 9.8 mg di-Arg insulin from a mixture according to (iii). The 

identity of the cleavage product is confirmed by protein sequencing and comparative HPLC with reference sub- 
stances. The di-Arg insulin can be chromatographically purified according to known methods and transformed to 
insulin with carboxypeptidase B. 

25 Example 5 

Construction of plasmid piNT60 

Plasmid pINTBO results in an insulin precursor, the ballast sequence of which consists of only nine amino acids. 
30 For construction of this plasmid, plasmid plNT40 is cut with Nco and Mlul and the resulting vector fragment is isolated. 
The oligonucleotide InsulS 



TTCGGGTACCGTTGTTGTAGTTTGAGTTGCGCAG 5 ' 

36 

TTGCCCATGGC 3' 

is then synthesized, filled in with Klenow polymerase and also cut with these two enzymes. The resulting DNA fragment 
is then ligated with the vector fragment to yield plasmid plNT60. 
40 Table 1 shows the DNA and amino acid sequence of the ballast constituent in this fusion protein. 

Example 6 

Construction of plasmid plNT67d 

45 

Plasmid plNT67d is a derivative of plNT4ld in which the codon of the amino acid in position nine of the ballast 
sequence is deleted. That is why. like plNT60, it results in an insulin precursor with a ballast sequence of nine amino 
acids. A method according to Ho. S.N. et al. (Gene 77:51-59, 1989) is used for its construction. For this purpose, two 
separate PCR's are first performed with plasmid plNT41d and the two oligonucleotkie pairs 

so 



55 
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TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

^ DTR8: 5'-CAC AAA TCG AGT TGC TGT TGA TGT TGT-3 ' 

or 

DTR9: 5'-ACA GCA ACT CGA TTT GTG AAC GAG CAC-3 ' 

10 and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3 ' • 



'5 Th is produces two fragments that are partially complementary to each other and when annealed with each other code 
a similar insulin precursor as plNT41d in which, however, the amino acid in position nine is absent. For completion, 
the two fragments are combined and subjected to another PGR together with the oligonucleotides TIR and Insull. From 
the DNA fragment obtained in this manner, the structural gene of the insulin precursor is liberated with Nco and Sail 
and purified. Plasmid plNT41d is then also cut with these two enzymes, the vector fragment is purified and subsequently 

20 ligated with the structural gene fragment from the PGR to yield plasmid plNT67d. 

The nucleotide and amino acid sequences for the ballast region have been reproduced in Table 1 . 

Example 7 

2S Construction of plasmid plNT68d 

Like plasmid plNT67d, plasmid plNT68d is a shortened derivative of plasmid plNT4ld in which the codons of the 
two amino acids in positions eight and nine of the ballast sequence are deleted. It therefore results in an Insulin precursor 
with a ballast sequence of only eight amino acids. The procedure previously described in Example 6 is used for its 
30 construction but with the two olignonucleotide pairs 

TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

DTRIO: 5'-CAC AAA TCG TGC TGT TGA TGT TGT TGC-3 ' 

or 

DTRll: 5'-TCA ACA GCA CGA TTT GTG AAC CAG CAC-3' 

and 

Insull: 5'-TCA TGT TTG ACA GCT TAT CAT-3'. 



3S 



40 



^ The nucleotide and amino acid sequences for the ballast region have been reproduced in Table 1 . 
Example 8 

Construction of plasmid ptNT69d 

so 

Plasmid plNT69d is also a shortened derivative of plasmid plNT4ld in which, however, the codons of the three 
amino acids in positions seven, eight and nine of the ballast sequence have been deleted. It therefore results in an 
insulin precursor with a ballast sequence of only seven amino acids. The procedure described in Example 6 is also 
used for its construction but with the two oligonucleotide pairs 

55 
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10 



IS 



TIR: 5'-CTG AAA TGA GOT GTT GAG- 3' 

and 

DTR12: 5'-CAC AAA TCG TGT TGA TGT TGT TGC CAT-3 ' 

or 

DTR13: 5'-ACA TCA ACA CGA TTT GTG AAC CAG CAC-3 ' 

and 

Insull: 5' -TCA TGT TTG ACA GCT TAT CAT-3 ' . 

The nucleotide and amino acid sequences for tlie ballast region have been reproduced In Table 1. 
Example 9 

Construction of plasmid plNT72d 

^0 Plasmid plNT72d is a derivative of plasmid plNT69d in which the entire C-peptide gene region, with the exception 

of the first codon for the amino acid arginine, is deleted. Consequently, this results in a "miniproinsulin derivative" with 
an arginine residue instead of a C-chain. With plasmid plNT69d as a starting point, the procedure described in Example 
6 is also used for its construction but with the two oligonucleotide pairs 

25 

TIR: 5'-CTG AAA TGA GCT GTT GAC-3' 

and 

Insu28: 5' -GAT GCC GCG GGT CTT GGG TGT-3' 

30 

or 

Insu27: 5'-AAG ACC CGC GGC ATC GTG GAG-3 ' 
and 

35 Insull: 5' -TCA TGT TTG ACA GCT TAT CAT-3 ' . 

Example 10 

40 Construction of plasmids plNT73d, plNT88d and plNT89d 

Plasmid plNT73d is a derivative of plasmid plNT69d (Example 8). in which the insulin precursor gene is arranged 
two times in succession. The plasmid therefore results in the formation of a polycistronic mRN A, which can double the 
yield. For its constnjction, a PGR reaction is carried out with plasmid plNT69d and the two oligonucleotides 

45 

Insu29: 5'-CTA GTA CTC GAG TTC AC-3' 



so 



55 



and 

Insull: 5' -TCA TGT TTG ACA GCT TAT CAT-3r 



This gives rise to a fragment with the insulin precursor gene and the pertinent ribosome binding site which in its 
5'-end region has a cleavage site for enzyme Xhol and in its 3'-end region a cleavage site for Sail. The fragment is cut 
with the two above-mentioned enzymes and purified. Plasmid plNT69d is then linearized with Sail, the two DNA ends 
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produced are dephosphorylated with phosphatase (from calf intestine) and llgated with the fragment from the PGR 
reaction to yield piasmid plNT73d. 

In an analogous manner there are obtained plasmlds plNT88d and plNT89d when piasmid plNT72d (Example 9) 
is modified analogously by arranging the 'miniproinsulin gene" twice or thrice in sequence. 

s 

Example 11 

Construction of piasmid plNL41d 

10 The starting piasmid pRUD3 has a structure analogous to that of piasmid pGATTR However, instead of the trp- 

promoter region, it contains a tac-promoter region which is flanked by cleavage sites for enzymes EcoRI and Nco. The 
piasmid is cut with ECORI, whereupon the protruding ends of the cleavage site are filled in with Klenow polymerase. 
Cutting is perfomned subsequently with Nco and the ensuing promoter fragment is isolated. 

The trp-promoter of piasmid plNT4ld is flanked by cleavage sites for enzymes Pvull and Nco. Since the piasmid 

IS has an additional cleavage site for Pvull. it is completely cut with Nco, but only partially with Pvull. The vector fragment, 
which is missing only the promoter region, is then isolated from the ensuing fragments. This is then llgated with the 
tac-promoter fragment to yield piasmid plNL4ld. 

Example 12 

20 

Construction of piasmid pL4lc 

Piasmid pPL-lambda (which can be obtained from Pharmacia) has a lambda-pL-promoter region. The latter is 
flanked by nucleotide sequences: 

25 

5 ' GATCTCTCACCTACCAAACAAT3 ' 
and 

30 5 ' AGCTAACTGACAGGAGAATCC3 ' . 

Oligonucleotides 

35 

LPL3: 5' ATGAATTCGATCTCTCACCTACCAAACAAT 3' 
and 

LPL4: 5' TTGCCATGGGGATTCTCCTGTCAGTTAGCT 3' 

40 

are prepared for additional flanking of the promoter region with cleavage sites for enzymes EcoRI and Nco. A PCR is 
carried out with these oligonucleotkies and pPL-lambda and the resulting promoter fragment is cut with EcoRI and Nco 
and isolated. Piasmid plNL4!d is then also cut with these two enzymes and the ensuing vector fragment, which has no 
45 promoter, is then ligated with the lambda-pL-promoter fragment to yiekj piasmid pL41c. 

Example 13 

Construction of piasmid pL4ld 

so 

The trp-transcription terminator located between the resistance gene and the fusion protein gene in piasmid pL4lc 
is not effective in E. colt strains that are suitable for fermentation (e.g. E. coli N4830-1 ). For this reason, a polycistronk: 
mRNA and with it a large quantity of resistance gene product are fonmed in fermentation. To prevent this side reaction, 
the trp-terminator sequence is replaced by an effective terminator sequence of the E. coli-rrnB-operon. Piasmid pANG- 
ss MA has a structure similar to that of piasmid plNT41 d, but It has an angiogenin gene instead of the fusion protein gene 
and an rrnB-terminator sequence (from commercial piasmid pKK223-3, which can be obtained from Pharmacia) instead 
of the trp-terminator sequence. The piasmid is cut with Pvul and Sail and the fragment containing the rrnB-terminator 
is isolated. Piasmid pL4lc is then also cut with thes two enzymes and the fragment containing the insulin gene is 
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isolated. The two isolated fragments are then ligated to yield plasmid pL41d. 
Example 14 

Construction of plasmid pINTLI 

To prepare a plasmid for general use in the expression of fusion proteins, the proinsulin gene of plasmid plNT41d 
is replaced by a polyllnker sequence. This gene Is flanked by cleavage sites for enzymes fy/liul and Sail. The plasmid 
is therefore cut with the help of the two above-mentioned enzymes and the vector fragment is isolated. This is then 
ligated, to yield plasmid pINTLI, with the following two synthetic oligonucleotides 

BstEII AccI EcoRI Kpnl BamHI 

5 ' CGCGCCTGGTTACCTCGAGGTATACTACGAATTCCAGCTCGGTACCCGGGGATCC 
3 ' GGACCAATGGAGCTCCATATGATGCTTAAGCTCGAGCCATGGGCCCCTAGG 
Xhol Saci Xmal 



SphI Xbal 
CTGCAGGCATGCAAGCTTGTCTAG AC - 3 ' 
GACGTCCGTACCTTCGAACAGATCTGAGCT-5 ' 
PstI Hindlll (Sail). 

Example 15 

Insertion of a gene coding for HMG CoA-reductase (active domain) in pINTLi and expresskxi of the fusion protein 

Table 3 represents the DNA and amino acid sequence of the gene HMG CoA-reductase. The synthetic gene for 
HMG CoA-reductase known from EP-A O 292 803 (herein incorporated by reference) contains a cleavage site for 
BstEll in the region of amino acids Leu and Val in positions 3 and 4 (see Table 3). A protruding sequence corresponding 
to enzyme Xbal occurs at the end of the gene (In the noncoding area). The corresponding cleavage sites in the polyllnker 
of plasmid plNTLl are In the same reading frame. Both cleavage sites are in each case singular. 

Plasmid pUH10 contains the complete HMG gene (HMG fragments I, II, 111, and IV), corresponding to the DNA 
sequence of table 3. Construction of pUHIO (figure 2) is described in EP-A 0 292 803 herein incorporated by reference. 
Briefly, special plasmids are prepared for the subcloning of the gene fragments HMG I to HMG I Vand for the construction 
of the complete gene. These plasmids are derived from the commercially available vectors pUC18. pUCIQ and 
M13mp18 or M13mp19, with the polyllnker region having been replaced by a new synthetic polyllnker corresponding 
to DNA sequence VI 



Nco EcoftI Hindi 11 BaoMI Xbal 

vi-la 

5' AAT TGC CAT GGC CAT GCG GAA TTC CAA GCT TTG GAT CCA TCT AGA GGC 

3' CG GTA CCC CTA CCC CTT AAG 6TT CGA AAC CTA CGT AGA TCT CCC TCG A 

Vl-lb 

These new plasmids have the advantage that, in contrast to the pUC and Ml 3mp plasmids, they altow the cloning 
of DNA fragments having the protruding sequences for the restriction enzyme Nco. Moreover, the recognition sequenc- 
es for the cleavage sites Nco, EcoRl, Hindlll, BamHI, and Xbal are contained in the vectors in exactly the sequence 
in which they are present in the complete gene HMG, which facilitates the sequential ck^ning and the construction of 
this gene. Thus it Is possible to subclone the gene fragments HMG I to HMG IV in the novel plasmids. After the gene 
fragments have been amplified, it is possible for the latter to be combined to give the complete gene (see below). 
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a. Preparation of vectors which contain DNA sequence Vi 

ONA sequence Vt may be prepared by standard techniques. The commercially available plasmid pUC18 (or 
pUC19, M13mp18 or M13mp19) is opened with the restriction enzymes EcoRI/Hindlll as stated by the manufacturer 

5 The digestion mixture is fractionated by electrophoresis on a 1% agarose gel. The plasmid bands which have been 
visualized by ethidium bromide staining are cut out and eluted from the agarose by electrophoresis. 20 fmol of the 
residual plasmid thus obtained are then ligated with 200 fmol of the DNA fragment corresponding to DNA sequence 
VI at room temperature overnight. A new cloning vector pSU18 (or pSU19. M13mUS18 or M13mUS19) is obtained. 
In contrast to the commercially available starting plasmids, the new plamids can be cut with the restriction enzyme 

10 Nco. The restriction enzymes EcoRI and Hindlll likewise cut the plasmids only once because the polylinker which is 
inserted via the EcoRI and Hindlll cleavage sites destroys these cleavage sites which are originally present. 

b. Preparation of the hybrid plasmids which contain the gene fragments HMG I to HMG IV. 

IS i) Plasmid containing the gene fragment HMG I 

The plasmid pSUl 8 is cut open with the restriction enzymes EcoRI and Nco in analogy to the description iri Example 
1 5 (a) above, and is ligated with the gene fragment I which has previously been phosphorylated. 

20 ii) Plasmid containing the gene fragment HMG II 

The plasmids with the gene subfragments HMG l(-1, 11-2 and 11-3 are subjected to restriction enzyme digestion 
with EcoRI/Miul. Mlul/BssHII or BssHII/Hindlll to isolate the gene fragments HMG IM . HMG 11-2 or HMG 11-3. respec- 
tively. The latter are then ligated in a known nnanner into the plasmid pSU1 8 which has been opened wrth EcoRt/Hindlll. 

25 

iij) Plasmid containing the gene fragment HMG HI 

The plasmids with the gene subfragments HMG III-1 and IM-3 are digested with the restriction enzymes EcoRI/ 

Hindlll and then cut with Sau981 to isolate the gene fragment HMG III-1, or with BamHI/Banll to isolate the gene 
30 fragment HMG III-3. These fragments can be inserted with the HMG III-2 fragment into a pSU18 plasmid which has 
been opened with Hindlll/BamHl. 

iv) Plasmid containing the gene fragment HMG IV 

55 The plasmids with the gene subfragments HMG IV(1+2) and IV-(3+4) are opened with the restriction enzymes 

EcoRI/BamHI and EcoRI/Xbal, respectively and the gene fragments HMG IV-(1+2) and HMG IV-(3+4) are purified by 
electrophoresis. The resulting fragments are then ligated into a pSUIS plasmid which has been opened with BamHI/ 
Xbal and in which the EcoRI cleavage site has previously been destroyed with S1 nuclease as described bebw A 
hybrid plasmid which still contains an additional AATT nucleotide sequence in the DNA sequence IV is obtained. The 

40 hybrid plasmid is opened at this point by digestion with the restriction enzyme EcoRI, and the protruding AATT ends 
are removed with S1 nuclease. For this purpose, 1 pig of plasmid is. after EcoRI digestion, incubated with 2 units of 
SI nuclease in 50 mM sodium acetate buffer (pH 4.5). containing 200 mM NaCI and 1 mM zinc chloride, at 20^*0 for 
30 minutes. The plasmid is recircularized in a known manner via the blunt ends. A hybrki plasmid which contains the 
gene fragment IV is obtained. 

45 

c. Gonstructk)n of the hvbrkj plasmid pUHIO whteh contains the DNA sequence V 

The hybrid plasmkj with the gene fragment HMG I is opened with EcoRI/Hindlll and ligated with the fragment HMG 
II which is obtained by restrlctkx) enzyme digestion of the corresponding hybrid plasmid with EcoRI/Hindlll. The re- 

50 suiting plasmid Is then opened with Hindlll/BamHl and ligated with the fragment HMG III whrch can be obtained from 
the corresponding plasmid using Hindlll/BamHl. The plasmid obtained in this way is in turn opened with BamHI/Xbal 
and linked to the fragment HMG IV which is obtained by digestion of the corresponding plasmid with BamHI/Xbal. The 
hybrid plasmid pUH1 0 which contains the complete HMG gene, corresponding to DNA sequence V, is obtained. Figure 
2 shows the map of pUH10 diagrammatlcally, with 'orl" and 'Apr* indicating the orientation in the residual plasmid 

55 corresponding to pUC18. 

If pINTLt is cut with BstEll and Xbal and the large fragment is isolated, and if, on the other hand, plasmid pUHIO 
(figure 2) is digested with the same enzymes and the fragment whrch encompasses most of the DNA sequenc V from 
this plasmid is isolated, after ligation of the two fragments we obtain a plasmid whk:h codes a fusk)n protein in which 
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arginine follows the first eight amino acids in the ballast sequence of plNT41 d (Table 1 ), which is followed, starting with 
Leu^, by the structural gene of the active domain of HMG CoA-reductase. For purposes of comparison, the two initial 
plasmids are cut with enzymes Nco and Xbal and the corresponding fragments are ligated together, yielding a plasmid 
which codes, Immediately after the start codon. the active domain of HMG CoA-reductase (in accordance with DNA 

5 sequence V of EP-A 0 292 803, see table 3). 

Expression of the coded proteins occurs according to Example 4. Following the breakup of the cells, centrifugation 
is performed whereupon the expected protein of approximately 55 kDa is determined in the supernatant by gei elec- 
trophoresis. The band for the fusion protein is much more Intensive here than for the protein expressed directly. Indi- 
vidual portions of 100 |il of the supernatant are tested in undiluted form, in a dilution of 1:10 and in a dilution of 1:100 

10 for the formation of mevalonate. As an additional comparison, the fusion protein according to Example 4 (fusion protein 
with proinsulin constituent) is tested; no activity is apparent in any of the three concentrations. The fusion protein with 
the HMG CoA-reductase constituent exhibits maximum activity in ail three dilutions, while the product of the direct 
expression shows graduated activity governed by the concentration. This indicates better expression of the fusion 
protein by a factor of at least 1 00. 
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Construction of plasmid pB70 

5 Plasmid plNT41d is split with Mlul and Sail and the large fragment is isolated. Plasmid plK4 shown in figure 3a 

contains a gene for 'mini-proinsulin,' the C chain of which consists of arginine only. 

The construction of this plasmid has previously been described In EP-A 0,347,781 (herein incorporated by refer- 
ence). Briefly, the commercial plasmid pUC19 is opened using the restriction enzymes Kpnl and PstI and the large 
fragment (figure 3-(1)) is separated through a 0.8% strength 'Seaplaque' gel. This fragment is reacted with T4 DNA 

10 ligase using the DNA (figure 3-(2)) synthesized according to Table 4. Table 4 shows the sequence of gene fragment 
IK I. while table 5 represents the sequence of gene fragment IK II. 

This ligation mixture then is incubated with competent E. coli 79/02 cells. The transformation mixture is plated out 
on IPTG/Xgal plates which contain 20 mg/l of ampicillin. The plasmid DNA is isolated from the white colonies and 
characterized by restriction and DNA sequence analysis. The desired plasmids are called pIKI (figure 3). 

IS Accordingly, the DNA (figure 3-(5)) according to Table 5 is ligated into pUCI 9 which has been opened using Pstl 

and Hindlll (figure 3-(4)). The plasmid plK2 (figure 3) is obtained. 

The DNA sequences (2) and (5) of figure 3 according to Table 4 and 5 are reisotated from the plasmids pIKI and 
plK2 and ligated with pUC19, which has been opened using Kpnl and Hindlll (figure 3-(7)). The plasmid plK3 (figure 
3) is thus obtained which encodes for a modified human insulin sequence. 

20 The plasmid plK3 is opened using Mlul and Spel and the large fragment (figure 3a-(9)) is isolated. This is ligated 

with the DNA sequence (10) 



B30 Al A2 A3 A4 AS A6 A7 A8 A9 

2S (Thr)(Arg) Gly He Val Glu Gin Cys Cys (Thr) (Ser) (10) 

5' CG CGT GOT ATC GTT GAA CAA TGT TGT A 3' 

3' A CCA TAG CAA CTT GTT ACA ACA TGA TC 5' 

(Mlul) (Spel) 

30 which supplements the last codon of the B chain (B30) by one arginine codon and replaces the excised codon for the 
first 7 amino acids of the A chain and supplements the codon for the amino acids 8 and 9 of this chain. The plasmid 
plK4 (figure 3a) is thus obtained which encodes for human mini-proinsulin. 

In tables 4 and 5. the B- and A-chains of the insulin molecule are In each case indicated by the first and last amino 
acid. Next to the coding region in gene fragment IK II, there is a cleavage site for Salt which will be utilized in the 
3S following construction. 

Plasmid plK4 is cut with Hpal and Sail and the gene coding "mini-proinsulin" is isolated. This gene is ligated with 
the above-mentioned large fragment of plNT41d and the following synthetic DNA sequence. 

40 



4S 







B^ 








Arg 


Met 


Gly 


Arg 


Phe 




CGT 


ATG 


GGC 


CGT 


TTC 


GTT 


A 


TAG 


CCG 


GCA 


AAG 


CAA 



(Mlul) (Hpal) 



so This gives rise to plasmid pB70, which codes a fusion protein in which the ballast sequence (Table 1 , line 1 ) is followed 
by amino acid sequence Met-Gty-Arg which is followed by the amino acid sequence of the "mini-proinsulin". 



55 
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TABLE 4: Gene fragment IK I (2) 



Phe 

10 20 30 40 

• 

< 2 

CT TTG GAC AAG AGA TTC GTT AAC CAA CAC TTG TGT GGT TCT CAC 
CAT GGA AAC CTG TTC TCT AAC CAA TTG GTT GTG AAC ACA CCA AGA GTG 

< 1 > 

(Kpnl) Hpal 

50 60 70 80 90 

• 

— > < 4 

TTG GTG GAA GCG TTG TAC TTG GTT TGT GGT GAG CGT GGT TTC TTC 
AAC CAC CTT CGC AAC ATG AAC CAA ACA CCA CTC GCA CCA AAG AAG 
< 3 

B^* 

Thr Arg Lye Gly Ser Leu 
100 110 120 



TAC ACT CCA AAG ACG CGT AAG GGT TCT CTG CA 
ATG TGA GGT TTC TGC GCA TTC CCA AGA G 



Mlul (PstI) 
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Table 5: Gene fragment IK II (5) 



GLn Lys Arg Gly 
130 



140 



150 



160 



G AAG CGT GGT ATC GTT G7VA CAA TGT TGT ACT ACT ATC TGT TCT 
AC GTC TTC GCA CCA TAG CAA CTT GTT ACA ACA TGA TCA TAG ACA AGA 

< 5 

(PstI) Spel 

A^^ 

Asn 

170 180 190 200 210 



TTG TAG CAG CTG GAA AAC TAC TGT AAC TGA TAG TCG ACC CAT GGA 
2S AAC ATG GTC GAC CTT TTG ATG ACA TTG ACT ACT AGC TGG GTA CCT TCG A 
> 

(Hindlll) 



Example 17 

By using the oligonucleotides listed below there are obtained plasmids plNT90d to plNT96d in analogy to the 
previous examples. An asterisk indicates the same encoded amino acid in the ballast constituent as in plNT4ld. 

plNT92 encodes a double mutation in the insulin derivative encoded by the plasmid plNT72d since the codon for 
3S Arg at the end of the ballast constituent and In the 'mini C chain' Is substituted by the codon for Met. Thus the expressed 
preproduct can be cleaved with cyanogen bromide. 



40 
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pINT90d: ******GNSA* (variant of pINT69d) 
TIR : 5 ' -CTGAAATGAGCTGTTGAC-3 
and 

Insu50 : 5 ' -TGCCGAATTTCCTGTTGATGTTGTTGC-3 

or 

Insu4 9 : 5 ' -GGAAATTCGGCACGATTTGTGAACCAG-3 

and 

Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT- 3 * 



pINT91d: ******GNSA* (variant of pINT72d) 
TIR : 5 ' -CTGAAATGAGCTGTTGAC- 3 ' 
and 

InsuSO : 5 ' -TGCCGAATTTCCTGTTGATGTTGTTGC-3 

or 

Insu4 9 : 5 ' -GGAAATTCGGCACGATTTGTGAACCAG-3 

and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 
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pINT92d: (double mutant of pINT72d) 

Insu56 : 5 ' -TCGACCATGGCAACAACATCAACAATGTTTGTG-3 
and 

Insu58 : 5 ' -GATGCCCATGGTCTT-3 ' 
or 

Insu57 : 5 ' -AAGACCATGGGCATC-3 ' 
and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT93d: ****** (variant of pINT68d) 

Insu5 3 : 5 ' -ACCATGGCAACAACATCAACAAAACGATTTGTG-3 ' 

and 

Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT94d: ****** (variant of pINT68d) 

Insu54 : 5 ' -ACCATGGCAACAACATCAACACCACGATTTGTG-3 ' 

and 

Insul 1 : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 



pINT95d: ****** (variant of pINT68d) 

Insu5 5 : 5 ' -TCGACCATGGCAACAACATCAACAATGCGATTTGTG-3 ' 

and 

Insull : 5 ' -TCATGTTTGACAGCTTATCAT-3 ' 

pINT96d: ****** (variant of pINT68d) 

Insu7 1 : 5 ' -ACCATGGCAACAACATCAACAGGACGATTTGTG-3 ' 

and 

Insull: 5 ' -TCATGTTTGACAGCTTATCAT- 3 ' 

Claims 

1. A process for the preparation pf fusion proteins, which fusion proteins contain a desired protein and a ballast 
constituent, which process comprises 
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(a) constructing a mixed oligonucleotide which codes for the said ballast constituent, wherein the said oligo- 
nucleotide contains the DNA sequence (coding strand) 



(DCD). 

In which D is A. G or T and x is 4 to 12. 

(b) inserting the said mixed oligonucieotide into a vector so that it Is functionally linked to a regulatory region 
and to the structural gene coding for the said desired protein, 

(c) transforming host cells with the so-obtained vector population and 

(d) selecting from the transformants one or more clones expressing a fusion protein in high yield. 

2. The process as claimed in claim 1 , wherein the said oligonucleodie codes at its 3' end of the coding strand for an 
amino acid or for a group of amino acids which allows an easy cleavage of the said desired protein from the said 
ballast constituent. 

3. The process as claimed in claim 2, wherein said cleavage is an enzymatic cleavage. 

4. The process as claimed in claim 1 , wherein the said oligonucleotide is designed so that it leads to a fusion protein 
which is soluble or which easily can be solubilized. 

5. The process as claimed In claim 1 , wherein the said oligonucleotide is designed so that the ballast constituent 
does not interfere with folding of the said desired protein. 

6. The process as claimed in claim 1 , wherein x is 4 to 8. 

7. The process as claimed in claim 5, wherein the said oligonucleotide has the sequence (coding strand) 



ATG {DCD)y{NNN)2 



wherein N in the NNN triplet stands for identical or different nucleotides, excluding stop codons tor NNN, z Is 1 to 
4 and y + z is 6 to 12, y being at least 4. 

8. The process as claimed in claim 7, wherein y + z is 6 to 10. 

9. The process as claimed in claim 7, wherein y Is 5 to 8 and z is 1 . 

10. The process as claimed in claim 1 , wherein the said oligonucleotide has the sequence (coding strand) 



ATG GCW (DCD)4.8 CGW 

in which W is A or T. 



Patentanspruche 

1 . Verfahren zur Herstellung von Fusionsproteinen, welche Fusionsproteine ein gewQnschtes Protein und einen Bal- 
lastbestandteil enthalten, welches Verfahren umfa8t 

(a) das Konstruieren eines gemlschten Oligonucleotids. welches fur den genannten Ballastbestandteil codtert, 
wobei das genannte Oligonucleotid die DNA-Sequenz (codierender Strang) 
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(DCD)x 

enthalt, worin D fur A. G Oder T steht und x von 4 bis 12 betragt; 
5 (b) das Insertieren des genannten gemischten Oligonucleotids in einen Vektor, so da(3 dieses an eine regu- 

latorische Region und an das Strukturgen, welches fur das gewunschte Protein codiert, f unktionell gebunden 

ist; 

(c) das Transformieren der Wirtszellen mit der so erhaitenen Vektorpopulation; und 

(d) das Selel<tieren von einem oder mehreren Ktonen. welche ein Fusionsprotein in hoher Ausbeute exprimie- 

10 ren, aus den Transformanten. 

2. Verfahren nach Anspruch 1 , worin das genannte Oligonucleotid an seinem 3'-Ende des codierenden Stranges fur 
eine Aminosaure oder eine Gruppe von Aminosauren codiert, wodurch eine lelchte Spattung des gewOnschten 
Proteins von dem genannten Ballastbestandtell ermoglicht wird. 

IS 

3. Verfaliren nach Anspruch 2, worin die genannte Spaltung eine enzymatische Spaltung ist. 

4. Verfahren nach Anspruch 1 , worin das genannte Oligonucleotid so ausgestaltet ist, daf) es zu einem Fusionsprotein 
fOhrt. welches Idsllch ist oder welches leicht solubilisiert werden kann. 

20 

5. Verlahren nach Anspruch 1 , worin das genannte Oligonucleotid so ausgestaltet Ist, daB der Ballastbestandtell die 
Faltung des genannten gewOnschten Proteins nicht beeintrachtigt. 

6. Verfahren nach Anspruch 1 , worin x von 4 bis 8 betragt. 

2S 

7. Verfahren nach Anspruch 5, worin das genannte Oligonucleotid die Sequenz (codlerender Strang) 



ATG(DCD)y(JvrNN)z 

30 

besitzt, worin N im NNN-Triplett fur identische oder verschiedene Nukleotide steht, wobei Stopcodons fur NNN 
ausgeschlossen sind, z von 1 bis 4 betragt und y-f z von 6 bis 1 2 betragt, wobei y nnindestens 4 ist. 

8. Verfahren nach Anspruch 7, worin y+z von 6 bis 10 betragt. 

35 

9. Verfahren nach Anspruch 7, worin y von 5 bis 8 betragt und z 1 Ist. 

10. Verfahren nach Anspruch 1 , worin das genannte Oligonucleotid die Sequenz (codlerender Strang) 

ATG GCW (DCD)4»8 CGW 



besitzt, worin W A oder T ist. 

45 

Revendleations 



1. Proc^dd pour la preparation de prot^lnes de fusion, lesqueltes protdines de fusion contiennent une protdine re- 
cherchde et un constltuant de lestage, iequel procddd comprend 

50 

(a) la construction d'un oligonucl^tlde mixte codant pour ledit constltuant de lestage. ledit oligonucltetkle 
contenant la s6quence d'ADN (brin codant) 



(DCD)^ 

dans laquelle D est A, G ou T et x va de 4 d 1 2. 

(b) insertion dudit oligonucleotide mixt dans un vecteur, d manidr qu'il solt foncttonnellement M k une 
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region rdgulatrice et au gdine structural codant pour ladite protdine recherchde, 

(c) la transtomnation de cellules hdtes par la population de vecteurs ainsi obtenue et 

(d) la selection, h partir des transformants, d'un ou plusieurs clones exprimant une prot6ine de fusion avec un 
rendement dievd. 

5 

2. Proc6dd selon la revendication 1 , dans lequel ledit oligonucleotide code k son extr^mitd 3' du brin codant pour un 
aminoacide ou pour un groupe d'aminoacides permettant une separation ais^e de ladite protdine recherchde 
d'avec ledit constituant de lestage. 

10 3. Procdde selon la revendication 2, dans lequel ladite separation est une coupure enzynnatique. 

4. Proc^de selon la revendication 1 , dans lequel ledit oligonucleotide est congu de maniere d conduire k une proteine 
de fusion qui est soluble ou qui peut dtre aisement solubiiisee. 

IS 5. Procdde selon la revendication 1, dans lequel ledit oligonucleotide est congu de maniere que (e constituant de 
lestage n'intertere pas avec ie repliement de ladite proteine recherchee. 

6. Procede selon la revendication 1 , dans lequel x va de 4 ^ 8. 

20 7. Procede selon la revendication 5, dans lequel ledit oligonucleotide comporte la sequence (brin codant) 

ATG (DCD)^ (NNN)^ 

25 dans laquelle N dans (e triplet NNN represente des nucleotides identiques ou differents, k Texclusion des codons 

d'arrdt pour NNN, z va de 1 ^ 4 et y + z va de 6 ^ 12, y etant au moins egal k 4. 

8. Procede selon la revendication 7, dans lequel y + z va de 6 & 10. 

30 9. Procede selon la revendication 7, dans lequel y va de 5 ^ 8 et z est egal k 1 . 

10. Procede selon la revendication 1, dans lequel ledit oligonucleotide comporte la sequence (brin codant) 



ATG GCW (DCD)^ „ CGW 

dans laquelle W est A ou T. 
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